Goto

Collaborating Authors

 exact gaussian process


Exact Gaussian Processes on a Million Data Points

Neural Information Processing Systems

Gaussian processes (GPs) are flexible non-parametric models, with a capacity that grows with the available data. However, computational constraints with standard inference procedures have limited exact GPs to problems with fewer than about ten thousand training points, necessitating approximations for larger datasets. In this paper, we develop a scalable approach for exact GPs that leverages multi-GPU parallelization and methods like linear conjugate gradients, accessing the kernel matrix only through matrix multiplication. By partitioning and distributing kernel matrix multiplies, we demonstrate that an exact GP can be trained on over a million points, a task previously thought to be impossible with current computing hardware. Moreover, our approach is generally applicable, without constraints to grid data or specific kernel classes. Enabled by this scalability, we perform the first-ever comparison of exact GPs against scalable GP approximations on datasets with $10^4 \!-\! 10^6$ data points, showing dramatic performance improvements.


Reviews: Exact Gaussian Processes on a Million Data Points

Neural Information Processing Systems

A good result from this rebuttal exercise is that the authors now report test likelihoods, which I think should always be reported in GP papers. It will be great if the authors can be a bit more explicit about the additional contributions with respect to the previous work [11,26]. It is OK to focus on the engineering effort and experimental evaluation as the main contribution of the paper. The paper will also benefit from reporting error bars on all the results. The paper does not provide new theoretical developments or a technical contribution.


Exact Gaussian Processes on a Million Data Points

Neural Information Processing Systems

Gaussian processes (GPs) are flexible non-parametric models, with a capacity that grows with the available data. However, computational constraints with standard inference procedures have limited exact GPs to problems with fewer than about ten thousand training points, necessitating approximations for larger datasets. In this paper, we develop a scalable approach for exact GPs that leverages multi-GPU parallelization and methods like linear conjugate gradients, accessing the kernel matrix only through matrix multiplication. By partitioning and distributing kernel matrix multiplies, we demonstrate that an exact GP can be trained on over a million points, a task previously thought to be impossible with current computing hardware. Moreover, our approach is generally applicable, without constraints to grid data or specific kernel classes. Enabled by this scalability, we perform the first-ever comparison of exact GPs against scalable GP approximations on datasets with 10 4 \!-\! 10 6 data points, showing dramatic performance improvements.


Exact Gaussian Processes for Massive Datasets via Non-Stationary Sparsity-Discovering Kernels

Noack, Marcus M., Krishnan, Harinarayan, Risser, Mark D., Reyes, Kristofer G.

arXiv.org Machine Learning

A Gaussian Process (GP) is a prominent mathematical framework for stochastic function approximation in science and engineering applications. This success is largely attributed to the GP's analytical tractability, robustness, non-parametric structure, and natural inclusion of uncertainty quantification. Unfortunately, the use of exact GPs is prohibitively expensive for large datasets due to their unfavorable numerical complexity of $O(N^3)$ in computation and $O(N^2)$ in storage. All existing methods addressing this issue utilize some form of approximation -- usually considering subsets of the full dataset or finding representative pseudo-points that render the covariance matrix well-structured and sparse. These approximate methods can lead to inaccuracies in function approximations and often limit the user's flexibility in designing expressive kernels. Instead of inducing sparsity via data-point geometry and structure, we propose to take advantage of naturally-occurring sparsity by allowing the kernel to discover -- instead of induce -- sparse structure. The premise of this paper is that GPs, in their most native form, are often naturally sparse, but commonly-used kernels do not allow us to exploit this sparsity. The core concept of exact, and at the same time sparse GPs relies on kernel definitions that provide enough flexibility to learn and encode not only non-zero but also zero covariances. This principle of ultra-flexible, compactly-supported, and non-stationary kernels, combined with HPC and constrained optimization, lets us scale exact GPs well beyond 5 million data points.


Exact Gaussian Processes on a Million Data Points

Wang, Ke, Pleiss, Geoff, Gardner, Jacob, Tyree, Stephen, Weinberger, Kilian Q., Wilson, Andrew Gordon

Neural Information Processing Systems

Gaussian processes (GPs) are flexible non-parametric models, with a capacity that grows with the available data. However, computational constraints with standard inference procedures have limited exact GPs to problems with fewer than about ten thousand training points, necessitating approximations for larger datasets. In this paper, we develop a scalable approach for exact GPs that leverages multi-GPU parallelization and methods like linear conjugate gradients, accessing the kernel matrix only through matrix multiplication. By partitioning and distributing kernel matrix multiplies, we demonstrate that an exact GP can be trained on over a million points, a task previously thought to be impossible with current computing hardware. Moreover, our approach is generally applicable, without constraints to grid data or specific kernel classes.